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AMENDMENTS TO THE CLAIMS: 

1. (Currently amended) A method of executing a linear algebra subroutine, said method 
comprising: 

for an execution code controlling an operation of a floating point unit (FPU) 
performing a linear algebra subroutine execution, unrolling inserting instructions to prefetch 
timely move data into a cache providing data iate said FPU, said unrolling causing said 
instructions to touch data anticipated therebv improving an efficiencv for said linear algebra 
subroutine execution. 

2. (Currently amended) The method of claim 1, wherein said prefetching timely moving data 
is accomplished by utilizing scheduling move tvpe instructions into time slots caused by a 
difference between a time to execute instructions in said subroutine execution process and a time 
to load said data existing in a Level 3 Dense Linear Algebra Subroutine . 

3. (Currently amended) The method of claim 1, wherein said matrix linear algebra subroutine 
comprises a matrix multiplication operation. 

4. (Currently amended) The method of claim 1, wherein said matrix linear algebra subroutine 
comprises a more efficient equivalent of a subroutine from a LAPACK (Linear Algebra 
PACKage). 

5. (Currently amended) The method of claim ^ claim 1 . wherein said LAPACK linear algebra 
subroutine comprises invokes a BLAS Level 3 LI cache kernel. 
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6. (Currently amended) An apparatus, comprising: 

a memory to store matrix data to be used for processing in a linear algebra program; 

a floating point unit (FPU) to perform said processing; 

a load/store unit (LSU) to load data to be processed by said FPU, said LSU loading 

said data into a plurality of floating point registers (FRegs); and 

a cache to store data from said memory and provide said data to said FRegs, 
wherein said matrix data in said memory is touched timely moved by inserting moving 

instructions to be loaded into said cache prior to a need for said data to be in said FRegs for 

said processing. 

7. (Original) The apparatus of claim 6, wherein said linear algebra program comprises a 
matrix multiplication operation. 

8. (Currently amended) The apparatus of claim 6, wherein said linear algebra program 
comprises a more efficient equivalent of a subroutine from a LAPACK (Linear Algebra 
PACKage). 

9. (Currently amended) The apparatus of claim 8 6, wherein said LAPACK subroutine 
processing comprises invoking a BLAS Level 3 LI cache kernel. 

10. (Currently amended) The apparatus of claim 6, further comprising: 

a compiler as modified to incorporate linear algebra theory and techniques to 
automatically g enerate instructions for said touching inserting said moving instructions . 
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11. (Currently amended) The apparatus of claim 10, wherein said moving instructions cause 
a pr e f e tching of said data by utUi ang are inserted into time slots caus e d by a diff e r e nc e b e tw ee n a 

time to execute instructions in said subroutine execution process and a time to load said data 
existing in a Level 3 Dense Linear Algebra Subroutine . 

12. (Currently amended) A signal b e aring computer-readable storage medium tangibly 
embodying a program of machine-readable instructions executable by a digital processing 
apparatus to perform a method of executing linear algebra subroutines, said method 
comprising: 

for an execution code controlling an operation of a floating point unit (FPU) 
performing a linear algebra subroutine execution, unrolling inserting instiuctions to prefetch 
timely move data into a cache providing data into said FPU, said unrolling causing said 
instructions to touch data anticipated thereby improving an efficiency for said linear algebra 
subroutine execution. 

13. (Currently amended) The signal bearing computer-readable storage medium of claim 12, 
wherein said prefetching data timely moving data is accomplished by utilizing inserting move 
tvpe instructions into time slots caused by a difference between a time to ex e cute instructions in 
said subroutine execution proc e ss and a tim e to load said data existing in a Level 3 Dense Linear 
Algebra Subroutine . 

14. (Currently amended) The signal bearing computer-readable storage medium of claim 12, 
wherein said matrix linear algebra subroutine comprises a matrix multiplication operation. 



6 



Serial No. 10/671,889 

Docket No. YOR920030170US1 (YOR.464) 



15. (Currently amended) The signal bearing computer-readable storage medium of claim 12, 
wherein said matrix linear algebra subroutine comprises a more efficient equivalent of a 
subroutine from a LAPACK (Linear Algebra PACKage). 

16. (Currently amended) The signal bearing computer-readable storage medium of claim 12, 
wherein said LAPACK linear algebra subroutine comprises invokes a BLAS Level 3 LI cache 
kernel. 

17. (Currently amended) A method of providing a service involving at least one of solving 
and applying a scientific/engineering problem, said method comprising at least one of: 

using a linear algebra software package that computes one or more matrix 
subroutines, wherein said linear algebra software package generates an execution code 
controlling an operation of a floating point unit (FPU) performing a linear algebra subroutine 
execution, unrolling such that instructions are inserted to prefetch timely move data into a 
cache providing data iate for said FPU, said unrolling causing said instructions to touch data 
anticipated thereby improving an efficiency for said linear algebra subroutine execution; 

providing a consultation for solving a scientific/engineering problem using said linear 
algebra software package; 

transmitting a result of said linear algebra software package on at least one of a 
network, a signal-bearing medium containing machine-readable data representing said result, 
and a printed version representing said result; and 

receiving a result of said linear algebra software package on at least one of a network, 
a signal-bearing medium containing machine-readable data representing said result, and a 
printed version representing said result. 
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18. (Currently amended) The method of claim 17, wherein said matrix linear algebra 
subroutine comprises a more efficient equivalent of a subroutine from a LAPACK (Linear 
Algebra PACKage). 

19. (Currently amended) The method of claim ^8 17, wherein said LAPACK linear algebra 
subroutine comprises invokes a BLAS Level 3 LI cache kernel. 

20. (New) The method of claim 1, further comprising: 

modifying a compiler to incorporate linear algebra theory and techniques to 
automatically generate instructions for said inserting said instructions. 



